Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPM: For pm = ondemand, don't respawn processes that reached pm.max_requests #4101

Closed

Conversation

mpdude
Copy link

@mpdude mpdude commented May 2, 2019

This addresses https://bugs.php.net/bug.php?id=77959.

In short: When using pm = ondemand and there is a short peak in load, many worker processes will be spawned. As workers will be used in a round-robin fashion, even for a lightly loaded server it may happen that every of those workers will be used before pm.process_idle_timeout is reached. In this case, the numer of workers will never decrease again as every process reaching pm.max_requests will immediately be replaced by a new one.

With this PR, for ondemand workers that reached pm.max_requests will not automatically be replaced, but a new process will only be forked if deemed necessary by the process management.

Please be lenient with me – this is my first php-src contribution.

@KalleZ
Copy link
Member

KalleZ commented May 2, 2019

cc @bukka

@KalleZ KalleZ requested a review from bukka May 2, 2019 20:24
@mpdude
Copy link
Author

mpdude commented May 2, 2019

Could someone please point me to the place where the worker is chosen for an incoming request? Or does that happen somewhere in a syscall (select(), epoll?) where the kernel chooses one of several processes waiting for a file descriptor or so?

@php-pulls
Copy link

php-pulls commented May 3, 2019 via email

@mpdude
Copy link
Author

mpdude commented May 3, 2019

It is mostly epoll() these days, not select(), but yes, whether it is select, poll or epoll, we give the kernel a set of file descriptors and it chooses one.

With my very limited C and Linux programming knowledge, I attached strace to a few running worker processes and found that they block in the accept() syscall until a connection comes in.

So, am I right assuming that the fact incoming requests are handled by workers in a round-robin fashion is an implementation detail of this syscall and nothing we could easily change?

@bukka
Copy link
Member

bukka commented May 3, 2019

Yeah basically all workers (children in the pool) share the open socket for the pool. They all call accept so they are in the same wait queue which is handled by kernel. There is no event loop for workers as we are talking about multiple independent processes. The event pool is used only in master for workers management and other stuff.

I'm not sure about this PR. I see what you are trying to do - basically using max_requests as a reducer but it will happen independently on the current load which I don't think is the right solution. Also pm.max_request should not be used unless you have a leaking application so it's not really meant for this purpose. I think we might agree that it's kind of a hack... :)

Having a good scaling down mechanism is not easy though. It should be based on the actual load which might be quite difficult to implement right in FPM...

@mpdude
Copy link
Author

mpdude commented May 4, 2019

@bukka Thank you for your detailed explanation. And yes, I agree it’s not a good solution.

Regarding the ondemand model, the only mechanism to scale down is the idle timeout. But workers run round-robin. So as long as your system receives number_of_current_workers requests per max_idle_time seconds, they don’t time out.

If there has been a spike in load, a large number of workers may have been forked that will never go away – even tough if just very few of them would be able to handle the load.

https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/ suggests that epoll() can be used on the listening socket to make the last busy worker be the preferred one to receive the next request, which I think could help here.

What do you think about that?

Am I right that the accept() in question is in the FCGI code? Will epoll() cause portability problems?

@bukka
Copy link
Member

bukka commented May 5, 2019

Yeah it looks that current process idle logic for ondemand would need a FIFO (epoll) mechanism to work correctly as it will be picked up in round robin fashion. The only question is how it effects user with a big load - it can potentially lead to overloading single worker. So I think that behavior should be initially optional.

It needs to be portable as FPM is also used in FreeBSD, OpenBSD and possibly other platform that don't have epoll. So ideally it should use fpm events that would need to run in the worker. The accept is in fcgi_accept_request which is in FCGI code but the loop is in fpm_main.c so it should be possible to implement it without touching FCGI code as the listening socket is available (fcgi_fd).

@mpdude
Copy link
Author

mpdude commented May 5, 2019

Why would it make a difference if workers accept() round-robin or FIFO? I mean, the first idle one should pick the connection. So does it make a difference if the oldest worker gets to handle most of the connections?

@bukka
Copy link
Member

bukka commented May 26, 2019

I guess some users might want to spread load more evenly between cores especially for other modes than ondemand. The blog article that you linked mentions such cases (a little bit more info is in discussion) so some people might prefer it. As I said I think that FIFO definitely makes much more sense for ondemand and I think it should be default. But it would be good to have an option to switch to the current way (accept only).

All of this is something that I would consider only for the next minor as it's a bigger change (currently 7.4) that I wouldn't risk in a bug fixing version.

@mpdude
Copy link
Author

mpdude commented Jan 9, 2020

I have little experience in C programming, in particular at the lower level relevant here and when it comes to portability. I would, however, see it as a challenge and see if I can learn what is necessary.

The question is – would anyone here mentor and guide me and maybe show the first steps that seem appropriate?

@bukka bukka added the SAPI: fpm label Jun 9, 2020
@bukka
Copy link
Member

bukka commented Jun 9, 2020

@mpdude Sorry completely missed the previous comment and was also a bit busy at that time. I'd be happy to help with reviews and suggesting things if you are still interested.

@ramsey
Copy link
Member

ramsey commented May 27, 2022

@mpdude Is this something you're still interested in pursuing? If so, it will need to be updated to target PHP-8.0. If not, please feel free to close this PR. Thanks!

@iluuu1994
Copy link
Member

Closing this due to a lack of response. Feel free to reopen if work continues.

@iluuu1994 iluuu1994 closed this Jun 27, 2022
@mpdude
Copy link
Author

mpdude commented Jun 28, 2022

Thanks for all your comments.

I am afraid I am unable to bring this forward, since I have only basic C skills and no experience in programming on such a (for me) low level, let alone a development environment where I could work and debug this effectively. As for everyone in open source, my time available is limited and so I'll have to pick other items where I can contribute better. Sorry!

@iluuu1994
Copy link
Member

@mpdude No need to apologize, thanks for your efforts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants